研究者詳細 - 正田　備也

2025/10/09 更新

写真b

マサダ　トモナリ

正田　備也

MASADA Tomonari

*大学が定期的に情報更新している項目（その他は、researchmapの登録情報を転載）

所属*

人工知能科学研究科人工知能科学専攻博士課程前期課程
経済学部経済学科
人工知能科学研究科人工知能科学専攻博士課程後期課程

職名*

教授

学位

学士（理学）（東京大学） / 博士（情報理工学）（東京大学） / 修士（理学）（東京大学） / 修士（学術）（東京大学）

ホームページ

https://tomonari-masada.github.io/

研究キーワード

確率モデル

テキストマイニング

機械学習

データマイニング

担当科目*

2025年度

学内職務経歴*

2022年4月 - 現在

人工知能科学研究科人工知能科学専攻博士課程前期課程教授
2022年4月 - 現在

人工知能科学研究科人工知能科学専攻博士課程後期課程教授
2020年4月 - 現在

経済学部経済学科教授
2020年4月 - 2022年3月

人工知能科学研究科人工知能科学専攻修士課程教授

外部リンク

研究分野

情報通信 / 情報学基礎論
情報通信 / 知能情報学
情報通信 / データベース

経歴

2020年4月 - 現在

立教大学大学院人工知能科学研究科教授

▶ 詳細を見る

researchmap
2012年4月 - 2020年3月

長崎大学大学院工学研究科准教授

▶ 詳細を見る

国名：日本国

researchmap
2008年 - 2012年

長崎大学工学部電気情報工学講座助教

▶ 詳細を見る

researchmap
2008年 - 2012年

Assistant Professor,Electrical and Electronic ,Faculty of Engineering,Nagasaki University

▶ 詳細を見る

researchmap
2007年 - 2008年

長崎大学工学部情報システム工学科助教

▶ 詳細を見る

researchmap
2007年 - 2008年

Assistant Professor,Computer and Information Sciences,Faculty of Engineering,Nagasaki University

▶ 詳細を見る

researchmap
1999年10月 - 2001年9月

富士写真光機株式会社職員（技術系）光学設計部

▶ 詳細を見る

researchmap
1999年 - 2001年

Engineering Staff

▶ 詳細を見る

researchmap

▼全件表示

学歴

- 2004年

東京大学情報理工学系研究科電子情報学専攻

▶ 詳細を見る

国名：日本国

researchmap
- 1999年

東京大学総合文化研究科広域科学専攻

▶ 詳細を見る

国名：日本国

researchmap
- 1995年

東京大学理学系研究科情報科学専攻

▶ 詳細を見る

国名：日本国

researchmap
- 1993年

東京大学理学部情報科学科

▶ 詳細を見る

国名：日本国

researchmap

受賞

2020年2月

九州半導体・エレクトロニクスイノベーション協議会令和元年度第二回「SIIQ技術大賞」金賞

正田備也

▶ 詳細を見る

researchmap
2018年5月

Science and Engineering Institute Best Oral Presentation Document Modeling with Implicit Approximate Posterior Distributions

正田備也

▶ 詳細を見る

researchmap
2011年6月

INSTICC Best Paper Award DOCUMENTS AS A BAG OF MAXIMAL SUBSTRINGS - An Unsupervised Feature Extraction for Document Clustering

▶ 詳細を見る

researchmap
2006年

情報処理学会論文賞

▶ 詳細を見る

受賞国：日本国

researchmap
2003年

DEWS優秀プレゼンテーション賞

▶ 詳細を見る

受賞国：日本国

researchmap

論文

Feature Extraction from Equipment Sensor Signals with Time Series Clustering and Its Application to Defect Prediction

Daisuke Hamaguchi, Tomonari Masada, Takumi Eguchi

IEEE International Symposium on Semiconductor Manufacturing Conference Proceedings2020- 2020年12月15日

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Institute of Electrical and Electronics Engineers Inc.

In semiconductor manufacturing processes, it is important to quickly identify any signs of the occurrence of defects. We applied a time-series clustering method to the signal data of processing equipment and obtained information related to the occurrence of defects. By using the information as the feature values of a prediction model, we were able to predict defects more accurately than by using only conventional feature values.

DOI： 10.1109/ISSM51728.2020.9377525

Scopus

researchmap
Myanmar Text-to-Speech System based on Tacotron-2.

Yuzana Win, Tomonari Masada

International Conference on Information and Communication Technology Convergence(ICTC) 578 - 583 2020年

▶ 詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

DOI： 10.1109/ICTC49870.2020.9289599

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/ictc/ictc2020.html#WinM20
Myanmar Text-to-Speech System based on Tacotron (End-to-End Generative Model).

Yuzana Win, Htoo Pyae Lwin, Tomonari Masada

International Conference on Information and Communication Technology Convergence(ICTC) 572 - 577 2020年

▶ 詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

DOI： 10.1109/ICTC49870.2020.9289277

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/ictc/ictc2020.html#WinLM20
Context-Dependent Token-Wise Variational Autoencoder for Topic Modeling. 査読有り

Tomonari Masada

Current Trends in Web Engineering - ICWE 2019 International Workshops 35 - 47 2019年

▶ 詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

DOI： 10.1007/978-3-030-51253-8_6

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/icwe/icwe2019w.html#Masada19
Difference between Similars: A Novel Method to Use Topic Models for Sensor Data Analysis. 査読有り

Tomonari Masada, Takumi Eguchi, Daisuke Hamaguchi

2019 International Conference on Data Mining Workshops 391 - 398 2019年

▶ 詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

DOI： 10.1109/ICDMW.2019.00064

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/icdm/icdm2019w.html#MasadaEH19
Mini-Batch Variational Inference for Time-Aware Topic Modeling. 査読有り

Tomonari Masada, Atsuhiro Takasu

PRICAI 2018: Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Nanjing, China, August 28-31, 2018, Proceedings, Part II 156 - 164 2018年

▶ 詳細を見る

担当区分：筆頭著者出版者・発行元：Springer

DOI： 10.1007/978-3-319-97310-4_18

researchmap
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition 査読有り

Tomonari Masada, Atsuhiro Takasu

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)10862 395 - 402 2018年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer Verlag

This paper proposes a method of scoring sequences generated by recurrent neural network (RNN) for automatic Tanka composition. Our method gives sequences a score based on topic assignments provided by latent Dirichlet allocation (LDA). When many word tokens in a sequence are assigned to the same topic, we give the sequence a high score. While a scoring of sequences can also be achieved by using RNN output probabilities, the sequences having large probabilities are likely to share much the same subsequences and thus are doomed to be deprived of diversity. The experimental results, where we scored Japanese Tanka poems generated by RNN, show that the top-ranked sequences selected by our method were likely to contain a wider variety of subsequences than those selected by RNN output probabilities.

DOI： 10.1007/978-3-319-93713-7_33

Scopus

researchmap
Document Modeling with Implicit Approximate Posterior Distributions. 査読有り

Tomonari Masada

Proceedings of the International Conference on Data Processing and Applications, ICDPA 2018, Guangdong, China, May 12-14, 2018 45 - 48 2018年

▶ 詳細を見る

出版者・発行元：ACM

DOI： 10.1145/3224207.3224214

researchmap
Adversarial Learning for Topic Models. 査読有り

Tomonari Masada, Atsuhiro Takasu

Advanced Data Mining and Applications - 14th International Conference, ADMA 2018, Nanjing, China, November 16-18, 2018, Proceedings 292 - 302 2018年

▶ 詳細を見る

出版者・発行元：Springer

DOI： 10.1007/978-3-030-05090-0_25

researchmap
Estimating Word probabilities with neural networks in latent dirichlet allocation 査読有り

Tomonari Masada

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)10526 129 - 137 2017年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer Verlag

This paper proposes a new method for estimating the word probabilities in latent Dirichlet allocation (LDA). LDA uses a Dirichlet distribution as the prior for the per-document topic discrete distributions. While another Dirichlet prior can be introduced for the per-topic word discrete distributions, point estimations may lead to a better evaluation result, e.g. in terms of test perplexity. This paper proposes a method for the point estimation of the per-topic word probabilities in LDA by using multilayer perceptron (MLP). Our point estimation is performed in an online manner by mini-batch gradient ascent. We compared our method to the baseline method using a perceptron with no hidden layers and also to the collapsed Gibbs sampling (CGS). The evaluation experiment showed that the test perplexity of CGS could not be improved in almost all cases. However, there certainly were situations where our method achieved a better perplexity than the baseline. We also discuss a usage of our method as word embedding.

DOI： 10.1007/978-3-319-67274-8_12

Scopus

researchmap
Exploring OOV Words from Myanmar Text Using Maximal Substrings 査読有り

Yuzana Win, Tomonari Masada

PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016 657 - 663 2016年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

This paper proposes a method for exploring out-of-vocabulary (OOV) words from Myanmar text by using maximal substrings. Our main purpose is to find OOV words that can be added into the Myanmar dictionary. The outcome of our method are new compound words that do not exist in the Myanmar dictionary. Our method consists of two steps. In the first step, we extract maximal substrings, i.e., the substrings whose number of occurrences are decreased only after appending a character before or after them, from Myanmar news articles. In the second step, we make the post processing of maximal substrings, because the results obtained by maximal substrings contain noisy characters. Our post-processing is threefold. First, we reduce the number of maximal substrings. Second, we remove maximal substrings whose prefixes and suffixes are meaningless characters. Third, we find OOV words that are the substrings consisting of the two words from the existing dictionary. Consequently, we obtain the substrings as candidates of new compound words that can be inserted into the existing Myanmar dictionary after being scrutinized by native speakers. We evaluate the accuracy of new compound words by using the subjective perspective. It is found that our results do seem promising. We appeal that new compound words obtained by our method are useful for expressing the words as a single unit of meaning that can be utilized in Myanmar text effectively.

DOI： 10.1109/IIAI-AAI.2016.73

researchmap

その他リンク： http://dblp.uni-trier.de/db/conf/iiaiaai/iiaiaai2016.html#conf/iiaiaai/WinM16
Extraction of Proper Names from Myanmar Text Using Latent Dirichlet Allocation 査読有り

Yuzana Win, Tomonari Masada

2016 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI) 96 - 103 2016年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

This paper proposes a method for proper names extraction from Myanmar text by using latent Dirichlet allocation (LDA). Our method aims to extract proper names that provide important information on the contents of Myanmar text. Our method consists of two steps. In the first step, we extract topic words from Myanmar news articles by using LDA. In the second step, we make a post-processing, because the resulting topic words contain some noisy words. Our post-processing, first of all, eliminates the topic words whose prefixes are Myanmar digits and suffixes are noun and verb particles. We then remove the duplicate words and discard the topic words that are contained in the existing dictionary. Consequently, we obtain the words as candidate of proper names, namely personal names, geographical names, unique object names, organization names, single event names, and so on. The evaluation is performed both from the subjective and quantitative perspectives. From the subjective perspective, we compare the accuracy of proper names extracted by our method with those extracted by latent semantic indexing (LSI) and rule-based method. It is shown that both LSI and our method can improve the accuracy of those obtained by rule-based method. However, our method can provide more interesting proper names than LSI. From the quantitative perspective, we use the extracted proper names as additional features in K-means clustering. The experimental results show that the document clusters given by our method are better than those given by LSI and rule-based method in precision, recall and F-score.

DOI： 10.1109/TAAI.2016.7880176

researchmap
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation 査読有り

Tomonari Masada, Atsuhiro Takasu

COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2016, PT IV9789 232 - 245 2016年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER INT PUBLISHING AG

This paper proposes a new inference for the latent Dirichlet allocation (LDA) [4]. Our proposal is an instance of the stochastic gradient variational Bayes (SGVB) [9,13]. SGVB is a general framework for devising posterior inferences for Bayesian probabilistic models. Our aim is to show the effectiveness of SGVB by presenting an example of SGVB-type inference for LDA, the best-known Bayesian model in text mining. The inference proposed in this paper is easy to implement from scratch. A special feature of the proposed inference is that the logistic normal distribution is used to approximate the true posterior. This is counterintuitive, because we obtain the Dirichlet distribution by taking the functional derivative when we lower bound the log evidence of LDA after applying a mean field approximation. However, our experiment showed that the proposed inference gave a better predictive performance in terms of test set perplexity than the inference using the Dirichlet distribution for posterior approximation. While the logistic normal is more complicated than the Dirichlet, SGVB makes the manipulation of the expectations with respect to the posterior relatively easy. The proposed inference was better even than the collapsed Gibbs sampling [6] for not all but many settings consulted in our experiment. It must be worthwhile future work to devise a new inference based on SGVB also for other Bayesian models.

DOI： 10.1007/978-3-319-42089-9_17

researchmap
A simple stochastic gradient variational bayes for the correlated topic model 査読有り

Tomonari Masada, Atsuhiro Takasu

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)9932 424 - 428 2016年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer Verlag

This paper proposes a new inference for the correlated topic model (CTM) [3]. CTM is an extension of LDA [4] for modeling correlations among latent topics. The proposed inference is an instance of the stochastic gradient variational Bayes (SGVB) [7,8]. By constructing the inference network with the diagonal logistic normal distribution, we achieve a simple inference. Especially, there is no need to invert the covariance matrix explicitly. We performed a comparison with LDA in terms of predictive perplexity. The two inferences for LDA are considered: the collapsed Gibbs sampling (CGS) [5] and the collapsed variational Bayes with a zero-order Taylor expansion approximation (CVB0) [1]. While CVB0 for LDA gave the best result, the proposed inference achieved the perplexities comparable with those of CGS for LDA.

DOI： 10.1007/978-3-319-45817-5_39

Scopus

researchmap
Heuristic Pretraining for Topic Models 査読有り

Tomonari Masada, Atsuhiro Takasu

CURRENT APPROACHES IN APPLIED ARTIFICIAL INTELLIGENCE9101 351 - 360 2015年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER-VERLAG BERLIN

This paper provides a heuristic pretraining for topic models. While we consider latent Dirichlet allocation (LDA) here, our pretraining can be applied to other topic models. Basically, we use collapsed Gibbs sampling (CGS) to update the latent variables. However, after every iteration of CGS, we regard the latent variables as observable and construct another LDA over them, which we call LDA over LDA (LoL). We then perform the following two types of updates: the update of the latent variables in LoL by CGS and the update of the latent variables in LDA based on the result of the preceding update of the latent variables in LoL. We perform one iteration of CGS for LDA and the above two types of updates alternately only for a small, earlier part of the inference. That is, the proposed method is used as a pretraining. The pretraining stage is followed by the usual iterations of CGS for LDA. The evaluation experiment shows that our pretraining can improve test set perplexity.

DOI： 10.1007/978-3-319-19066-2_34

researchmap
Traffic Speed Data Investigation with Hierarchical Modeling 査読有り

Tomonari Masada, Atsuhiro Takasu

FUTURE DATA AND SECURITY ENGINEERING, FDSE 20159446 123 - 134 2015年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER INT PUBLISHING AG

This paper presents a novel topic model for traffic speed analysis in the urban environment. Our topic model is special in that the parameters for encoding the following two domain-specific aspects of traffic speeds are introduced. First, traffic speeds are measured by the sensors each having a fixed location. Therefore, it is likely that similar measurements will be given by the sensors located close to each other. Second, traffic speeds show a 24-hour periodicity. Therefore, it is likely that similar measurements will be given at the same time point on different days. We model these two aspects with Gaussian process priors and make topic probabilities location-and time-dependent. In this manner, our model utilizes the metadata of the traffic speed data. We offer a slice sampling to achieve less approximation than variational Bayesian inferences. We present an experimental result where we use the traffic speed data provided by New York City.

DOI： 10.1007/978-3-319-26135-5_10

researchmap
Exploring Technical Phrase Frames from Research Paper Titles 査読有り

Yuzana Win, Tomonari Masada

2015 IEEE 29TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS WAINA 2015 558 - 563 2015年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

This paper proposes a method for exploring technical phrase frames by extracting word n-grams that match our information needs and interests from research paper titles. Technical phrase frames, the outcome of our method, are phrases with wildcards that may be substituted for any technical term. Our method, first of all, extracts word trigrams from research paper titles and constructs a co-occurrence graph of the trigrams. Even by simply applying PageRank algorithm to the co-occurrence graph, we obtain the trigrams that can be regarded as technical keyphrases at the higher ranks in terms of PageRank score. In contrast, our method assigns weights to the edges of the co-occurrence graph based on Jaccard similarity between trigrams and then apply weighted PageRank algorithm. Consequently, we obtain widely different but more interesting results. While the top-ranked trigrams obtained by unweighted PageRank have just a self-contained meaning, those obtained by our method are technical phrase frames, i.e., a word sequence that forms a complete technical phrase only after putting a technical word (or words) before or/and after it. We claim that our method is a useful tool for discovering important phraseological patterns, which can expand query keywords for improving information retrieval performance and can also work as candidate phrasings in technical writing to make our research papers attractive.

DOI： 10.1109/WAINA.2015.37

researchmap
ChronoSAGE: Diversifying Topic Modeling Chronologically 査読有り

Tomonari Masada, Atsuhiro Takasu

WEB-AGE INFORMATION MANAGEMENT, WAIM 20148485 476 - 479 2014年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER-VERLAG BERLIN

This paper provides an application of sparse additive generative models (SAGE) for temporal topic analysis. In our model, called ChronoSAGE, topic modeling results are diversified chronologically by using document timestamps. That is, word tokens are generated not only in a topic-specific manner, but also in a time-specific manner. We firstly compare ChronoSAGE with latent Dirichlet allocation (LDA) in terms of pointwise mutual information to show its practical effectiveness. We secondly give an example of time-differentiated topics, obtained by ChronoSAGE as word lists, to show its usefulness in trend detection.

DOI： 10.1007/978-3-319-08010-9_51

researchmap
A topic model for traffic speed data analysis 査読有り

Tomonari Masada, Atsuhiro Takasu

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)8482 ( 2 ) 68 - 77 2014年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer Verlag

We propose a probabilistic model for traffic speed data. Our model inherits two key features from latent Dirichlet allocation (LDA). Firstly, unlike e.g. stock market data, lack of data is often perceived for traffic speed data due to unexpected failure of sensors or networks. Therefore, we regard speed data not as a time series, but as an unordered multiset in the same way as LDA regards documents not as a sequence, but as a bag of words. This also enables us to analyze co-occurrence patterns of speed data regardless of their positions along the time axis. Secondly, we regard a daily set of speed data gathered from the same sensor as a document and model it not with a single distribution, but with a mixture of distributions as in LDA. While each such distribution is called topic in LDA, we call it patch to remove text-mining connotation and name our model Patchy. This approach enables us to model speed co-occurrence patterns effectively. However, speed data are non-negative real. Therefore, we use Gamma distributions in place of multinomial distributions. Due to these two features, Patchy can reveal context dependency of traffic speed data. For example, a 60 mph observed on Sunday can be assigned to a patch different from that to which a 60 mph on Wednesday is assigned. We evaluate this context dependency through a binary classification task, where test data are classified as either weekday data or not. We use real traffic speed data provided by New York City and compare Patchy with the baseline method, where a simpler data model is applied. © 2014 Springer International Publishing Switzerland.

DOI： 10.1007/978-3-319-07467-2_8

Scopus

researchmap
Explaining Prices by Linking Data: A Pilot Study on Spatial Regression Analysis of Apartment Rents 査読有り

Bin Shen, Tomonari Masada

2014 IEEE 3RD GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE) 188 - 189 2014年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

This paper reports a pilot study where we link different types of data for explaining prices. In this study, we link the apartment rent data with the publicly accessible location data of landmarks like supermarkets, hospitals, etc. We apply the regression analysis to find the most important factor determining the apartment rents. We claim that the results of this type of spatial data mining can enhance the user experience in the apartment search system, because we can indicate a rationale behind pricing as additional information to users and thus can make them more confident in their choices.

DOI： 10.1109/GCCE.2014.7031088

researchmap
Collaborator Recommendation for Isolated Researchers 査読有り

Tin Huynh, Atsuhiro Takasu, Tomonari Masada, Kiem Hoang

2014 28TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (WAINA) 639 - 644 2014年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

Successful research collaborations may facilitate major outcomes in science and their applications. Thus, identifying effective collaborators may be a key factor that affects success. However, it is very difficult to identify potential collaborators and it is particularly difficult for young researchers who have less knowledge about other researchers and experts in their research domain. This study introduces and defines the problem of collaborator recommendation for 'isolated' researchers who have no links with others in coauthor networks. Existing approaches such as link-based and content-based methods may not be suitable for isolated researchers because of their lack of links and content information. Thus, we propose a new approach that uses additional information as new features to make recommendations, i.e., the strength of the relationship between organizations, the importance rating, and the activity scores of researchers. We also propose a new method for evaluating the quality of collaborator recommendations. We performed experiments by crawling publications from the Microsoft Academic Search website. The metadata were extracted from these publications, including the year, authors, organizational affiliations of authors, citations, and references. The metadata from publications between 2001 and 2005 were used as the training data while those from 2006 to 2011 were used for validation. The experimental results demonstrated the effectiveness and efficiency of our proposed approach.

DOI： 10.1109/WAINA.2014.105

researchmap
Trimming prototypes of handwritten digit images with subset infinite relational model 査読有り

Tomonari Masada, Atsuhiro Takasu

Lecture Notes in Electrical Engineering240 129 - 134 2013年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

We propose a new probabilistic model for constructing efficient prototypes of handwritten digit images. We assume that all digit images are of the same size and obtain one color histogram for each pixel by counting the number of occurrences of each color over multiple images. For example, when we conduct the counting over the images of digit "5", we obtain a set of histograms as a prototype of digit "5". After normalizing each histogram to a probability distribution, we can classify an unknown digit image by multiplying probabilities of the colors appearing at each pixel of the unknown image. We regard this method as the baseline and compare it with a method using our probabilistic model called Multinomialized Subset Infinite Relational Model (MSIRM), which gives a prototype, where color histograms are clustered column- and row-wise. The number of clusters is adjusted flexibly with Chinese restaurant process. Further, MSIRM can detect irrelevant columns and rows. An experiment, comparing our method with the baseline and also with a method using Dirichlet process mixture, revealed that MSIRM could neatly detect irrelevant columns and rows at peripheral part of digit images. That is, MSIRM could "trim" irrelevant part. By utilizing this trimming, we could speed up classification of unknown images. © 2013 Springer Science+Business Media Dordrecht(Outside the USA).

DOI： 10.1007/978-94-007-6738-6_16

Scopus

researchmap
A revised inference for correlated topic model 査読有り

Tomonari Masada, Atsuhiro Takasu

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)7952 ( 2 ) 445 - 454 2013年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

In this paper, we provide a revised inference for correlated topic model (CTM) [3]. CTM is proposed by Blei et al. for modeling correlations among latent topics more expressively than latent Dirichlet allocation (LDA) [2] and has been attracting attention of researchers. However, we have found that the variational inference of the original paper is unstable due to almost-singularity of the covariance matrix when the number of topics is large. This means that we may be reluctant to use CTM for analyzing a large document set, which may cover a rich diversity of topics. Therefore, we revise the inference and improve its quality. First, we modify the formula for updating the covariance matrix in a manner that enables us to recover the original inference by adjusting a parameter. Second, we regularize posterior parameters for reducing a side effect caused by the formula modification. While our method is based on a heuristic intuition, an experiment conducted on large document sets showed that it worked effectively in terms of perplexity. © 2013 Springer-Verlag Berlin Heidelberg.

DOI： 10.1007/978-3-642-39068-5-54

Scopus

researchmap
Three-way nonparametric Bayesian clustering for handwritten digit image classification 査読有り

Tomonari Masada, Atsuhiro Takasu

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)8228 ( 3 ) 149 - 156 2013年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

This paper proposes a new approach for handwritten digit image classification using a nonparametric Bayesian probabilistic model, called multinomialized subset infinite relational model (MSIRM). MSIRM realizes a three-way clustering, i.e., a simultaneous clustering of digit images, pixel columns, and pixel rows, where the numbers of clusters are adjusted automatically with Chinese restaurant process (CRP). We obtain MSIRM as a modification of subset infinite relational model (SIRM) by Ishiguro et al. [4] While this modification is straightforward, our application of MSIRM to handwritten digit image classification leads to an impressive result. To represent a large number of training digit images in a compact form, we cluster the training images and then classify a test image to the class of the cluster most similar to the test image. By extending this line of thought, MSIRM clusters not only digit images but also pixel columns and pixel rows to obtain a more compact representation. With this three-way clustering, we achieved 2.95% and 5.38% test error rates for MNIST and USPS datasets, respectively. © Springer-Verlag 2013.

DOI： 10.1007/978-3-642-42051-1_20

Scopus

researchmap
Clustering Documents with Maximal Substrings 査読有り

Tomonari Masada, Atsuhiro Takasu, Yuichiro Shibata, Kiyoshi Oguri

ENTERPRISE INFORMATION SYSTEMS, ICEIS 2011102 19 - 34 2012年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER-VERLAG BERLIN

This paper provides experimental results showing that we can use maximal substrings as elementary building blocks of documents in place of the words extracted by a current state-of-the-art supervised word extraction. Maximal substrings are defined as the substrings each giving a smaller number of occurrences even by appending only one character to its head or tail. The main feature of maximal substrings is that they can be extracted quite efficiently in an unsupervised manner. We extract maximal substrings from a document set and represent each document as a bag of maximal substrings. We also obtain a bag of words representation by using a state-of-the-art supervised word extraction over the same document set. We then apply the same document clustering method to both representations and obtain two clustering results for a comparison of their quality. We adopt a Bayesian document clustering based on Dirichlet compound multinomials for avoiding overfitting. Our experiment shows that the clustering quality achieved with maximal substrings is acceptable enough to use them in place of the words extracted by a supervised word extraction.

DOI： 10.1007/978-3-642-29958-2_2

researchmap
Extraction of topic evolutions from references in scientific articles and its GPU acceleration 査読有り

Tomonari Masada, Atsuhiro Takasu

ACM International Conference Proceeding Series 1522 - 1526 2012年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ACM

This paper provides a topic model for extracting topic evolutions as a corpus-wide transition matrix among latent topics. Recent trends in text mining point to a high demand for exploiting metadata. Especially, exploitation of reference relationships among documents induced by hyperlinking Web pages, citing scientific articles, tumblring blog posts, retweeting tweets, etc., is put in the foreground of the effort for an effective mining. We focus on scholarly activities and propose a topic model for obtaining a corpus-wide view on how research topics evolve along citation relationships. Our model, called TERESA, extends latent Dirichlet allocation (LDA) by introducing a corpus-wide topic transition probability matrix, which models reference relationships as transitions among topics. Our approximated variational inference updates LDA posteriors and topic transition posteriors alternately. The main issue is execution time amounting to O(MK2), where K is the number of topics and M is that of links in citation network. Therefore, we accelerate the inference with Nvidia CUDA compatible GPUs. We compare the effectiveness of TERESA with that of LDA by introducing a new measure called diversity plus focusedness (D+F). We also present topic evolution examples our method gives. © 2012 ACM.

DOI： 10.1145/2396761.2398465

Scopus

researchmap

その他リンク： http://dblp.uni-trier.de/db/conf/cikm/cikm2012.html#conf/cikm/MasadaT12
Unsupervised segmentation of bibliographic elements with latent permutations 査読有り

Tomonari Masada, Yuichiro Shibata, Kiyoshi Oguri

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)6724 LNCS 254 - 267 2011年

▶ 詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

This paper introduces a novel approach for large-scale unsupervised segmentation of bibliographic elements. Our problem is to segment a word token sequence representing a citation into subsequences each corresponding to a different bibliographic element, e.g. authors, paper title, journal name, publication year, etc. Obviously, each bibliographic element should be represented by contiguous word tokens. We call this constraint contiguity constraint. Therefore, we should infer a sequence of assignments of word tokens to bibliographic elements so that this constraint is satisfied. Many HMM-based methods solve this problem by prescribing fixed transition patterns among bibliographic elements. In this paper, we use generalized Mallows models (GMM) in a Bayesian multi-topic model, effectively applied to document structure learning by Chen et al. [4], and infer a permutation of latent topics each of which can be interpreted as one among the bibliographic elements. According to the inferred permutation, we arrange the order of the draws from a multinomial distribution defined over topics. In this manner, we can obtain an ordered sequence of topic assignments satisfying contiguity constraint. We do not need to prescribe any transition patterns among bibliographic elements. We only need to specify the number of bibliographic elements. However, the method proposed by Chen et al. works for our problem only after introducing modification. The main contribution of this paper is to propose strategies to make their method work also for our problem. © 2011 Springer-Verlag.

DOI： 10.1007/978-3-642-24396-7_20

Scopus

researchmap
Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models 査読有り

Tomonari Masada, Atsuhiro Takasu, Yuichiro Shibata, Kiyoshi Oguri

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I6634 435 - 447 2011年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER-VERLAG BERLIN

This paper provides a new approach to topical trend analysis. Our aim is to improve the generalization power of latent Dirichlet allocation (LDA) by using document timestamps. Many previous works model topical trends by making latent topic distributions time-dependent. We propose a straightforward approach by preparing a different word multinomial distribution for each time point. Since this approach increases the number of parameters, overfitting becomes a critical issue. Our contribution to this issue is two-fold. First, we propose an effective way of defining Dirichlet priors over the word multinomials. Second, we propose a special scheduling of variational Bayesian (VB) inference. Comprehensive experiments with six datasets prove that our approach can improve LDA and also Topics over Time, a well-known variant of LDA, in terms of test data perplexity in the framework of VB inference.

DOI： 10.1007/978-3-642-20841-6-36

DOI： 10.1007/978-3-642-20841-6_36

researchmap
DOCUMENTS AS A BAG OF MAXIMAL SUBSTRINGS An Unsupervised Feature Extraction for Document Clustering 査読有り

Tomonari Masada, Yuichiro Shibata, Kiyoshi Oguri

ICEIS 2011: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1 5 - 13 2011年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：INSTICC-INST SYST TECHNOLOGIES INFORMATION CONTROL & COMMUNICATION

This paper provides experimental results showing how we can use maximal substrings as elementary features in document clustering. We extract maximal substrings, i.e., the substrings each giving a smaller number of occurrences even after adding only one character at its head or tail, from the given document set and represent each document as a bag of maximal substrings after reducing the variety of maximal substrings by a simple frequency-based selection. This extraction can be done in an unsupervised manner. Our experiment aims to compare bag of maximal substrings representation with bag of words representation in document clustering. For clustering documents, we utilize Dirichlet compound multinomials, a Bayesian version of multinomial mixtures, and measure the results by F-score. Our experiment showed that maximal substrings were as effective as words extracted by a dictionary-based morphological analysis for Korean documents. For Chinese documents, maximal substrings were not so effective as words extracted by a supervised segmentation based on conditional random fields. However, one fourth of the clustering results given by bag of maximal substrings representation achieved F-scores better than the mean F-score given by bag of words representation. It can be said that the use of maximal substrings achieved an acceptable performance in document clustering.

researchmap
Semi-supervised Bibliographic Element Segmentation with Latent Permutations 査読有り

Tomonari Masada, Atsuhiro Takasu, Yuichiro Shibata, Kiyoshi Oguri

DIGITAL LIBRARIES: FOR CULTURAL HERITAGE, KNOWLEDGE DISSEMINATION, AND FUTURE CREATION7008 60 - + 2011年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER-VERLAG BERLIN

This paper proposes a semi-supervised bibliographic element segmentation. Our input data is a large scale set of bibliographic references each given as an unsegmented sequence of word tokens. Our problem is to segment each reference into bibliographic elements, e.g. authors, title, journal, pages, etc. We solve this problem with an LDA-like topic model by assigning each word token to a topic so that the word tokens assigned to the same topic refer to the same bibliographic element. Topic assignments should satisfy contiguity constraint, i.e., the constraint that the word tokens assigned to the same topic should be contiguous. Therefore, we proposed a topic model in our preceding work [8] based on the topic model devised by Chen et al. [3]. Our model extends LDA and realizes unsupervised topic assignments satisfying contiguity constraint. The main contribution of this paper is the proposal of a semi-supervised learning for our proposed model. We assume that at most one third of word tokens are already labeled. In addition, we assume that a few percent of the labels may be incorrect. The experiment showed that our semi-supervised learning improved the unsupervised learning by a large margin and achieved an over 90% segmentation accuracy.

DOI： 10.1007/978-3-642-24826-9_11

researchmap
Implementation of a programming environment with a multithread model for reconfigurable systems 査読有り

Keisuke Dohi, Yuichiro Shibata, Tsuyoshi Hamada, Tomonari Masada, Kiyoshi Oguri, Duncan A. Buell

ACM SIGARCH Computer Architecture News38 ( 4 ) 40 - 45 2010年9月14日

▶ 詳細を見る

掲載種別：研究論文（学術雑誌）出版者・発行元：Association for Computing Machinery (ACM)

DOI： 10.1145/1926367.1926375

researchmap
Infinite Latent Process Decomposition 査読有り

Tomonari Masada, Yuichiro Shibata, Kiyoshi Oguri

2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW) 810 - 811 2010年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE COMPUTER SOC

This paper presents infinite latent process decomposition (iLPD), a new microarray analysis method, as an extension of latent process decomposition in Our method assumes an infinite number of latent processes. Further, our new collapsed variational Bayesian inference improves the inference proposed in [2] in the treatment of Dirichlet hyperparameters. We also give the results of the comparison experiment.

researchmap
Modeling Topical Trends over Continuous Time with Priors 査読有り

Tomonari Masada, Daiji Fukagawa, Atsuhiro Takasu, Yuichiro Shibata, Kiyoshi Oguri

ADVANCES IN NEURAL NETWORKS - ISNN 2010, PT 2, PROCEEDINGS6064 302 - + 2010年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER-VERLAG BERLIN

In this paper, we propose a new method for topical trend analysis. We model topical trends by per-topic Beta distributions as in Topics over Time (TOT), proposed as an extension of latent Dirichlet allocation (LDA). However, TOT is likely to overfit to timestamp data in extracting latent topics. Therefore; we apply prior distributions to Beta distributions in TOT. Since Beta distribution has no conjugate prior; we devise a trick, where we set one among the two parameters of each per-topic Beta distribution to one based on a Bernoulli trial and apply Gamma distribution as a conjugate prior. Consequently; we can marginalize out the parameters of Beta distributions and thus treat; timestamp data in a Bayesian fashion. In the evaluation experiment, we compare our method with LDA and TOT in link detection task on TDT4 dataset. We use word predictive probabilities as term weights and estimate document similarities by using those weights in a TFIDF-like scheme. The results show that our method achieves a moderate fitting to timestamp data.

DOI： 10.1007/978-3-642-13318-3_38

researchmap
A novel multiple-walk parallel algorithm for the Barnes-Hut treecode on GPUs - Towards cost effective, high performance N-body simulation 査読有り

Tsuyoshi Hamada, Keigo Nitadori, Khaled Benkrid, Yousuke Ohno, Gentaro Morimoto, Tomonari Masada, Yuichiro Shibata, Kiyoshi Oguri, Makoto Taiji

Computer Science - Research and Development24 ( 1-2 ) 21 - 31 2009年9月

▶ 詳細を見る

掲載種別：研究論文（学術雑誌）

Recently, general-purpose computation on graphics processing units (GPGPU) has become an increasingly popular field of study as graphics processing units (GPUs) continue to be proposed as high performance and relatively low cost implementation platforms for scientific computing applications. Among these applications figure astrophysical N-bodysimulations, which form one of the most challenging problems in computational science. However, in most reported studies, a simple O(N2)algorithm was used for GPGPUs, and the resulting performances were not observed to be better than those of conventional CPUs that were based on more optimized O(N log N) algorithms such as the tree algorithm or the particle-particle particle-mesh algorithm. Because of the difficulty in getting efficient implementations of such algorithms on GPUs, a GPU cluster had no practical advantage over general-purpose PC clusters for N-bodysimulations. In this paper, we report a new method for efficient parallel implementation of the tree algorithm on GPUs. Our novel tree code allows the realization of an N-bodysimulation on a GPU cluster at a much higher performance than that on general PC clusters. We practically performed a cosmological simulation with 562 million particles on a GPU cluster using 128 NVIDIA GeForce 8800GTS GPUs at an overall cost of 168172 $. We obtained a sustained performance of 20.1 Tflops, which when normalized against a general-purpose CPU implementation leads to a performance of 8.50 Tflops. The achieved cost/performance was hence a mere $19.8 /Gflops which shows the high competitiveness of GPGPUs. © 2009 Springer-Verlag.

DOI： 10.1007/s00450-009-0089-1

Scopus

researchmap
GPUを用いた位相限定相関法の高速化(ITS画像処理,映像メディア及び一般) 査読有り

松尾堅太郎, 三好正之, 濱田剛, 柴田裕一郎, 正田備也, 小栗清

映像情報メディア学会技術報告33 ( 0 ) 201 - 206 2009年

▶ 詳細を見る

記述言語：日本語出版者・発行元：映像情報メディア学会

位相限定相関法は画像マッチング・画像レジストレーションにおいて高いロバスト性とサブピクセル単位での高い精度を実現する計算方法であるが同時に計算コストが膨大であるという側面もある.これまで位相限定相関法の高速化には専用LSIやFPGAを用いた方法が試みられてきた.今回我々は新たにGPU(Graphics Processing Unit)を用いた位相限定相関法の高速化手法を考案し,Nvidia GPU,GeForce8800GTSへ実装を行った.GPU 1台当たりの処理時間に256×256 pixel画像が2.36秒,512×512 pixel画像が7.92秒,1024×1024 pixel画像が27.65秒で処理可能なことを確認し,これが過去の専用LSIやFPGAを用いた場合の計算速度と比較して約10倍程度高速であることを確認した.

DOI： 10.11485/itetr.33.6.0_201

CiNii Article

researchmap

その他リンク： http://hdl.handle.net/10069/22664
Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining 査読有り

Tomonari Masada, Atsuhiro Takasu, Tsuyoshi Hamada, Yuichiro Shibata, Kiyoshi Oguri

ADVANCES IN DATA AND WEB MANAGEMENT, PROCEEDINGS5446 556 - + 2009年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER-VERLAG BERLIN

In this paper, we propose a new probabilistic model, Bay of Timestamps (BoT), for chronological text mining. BoT is an extension of latent Dirichlet allocation (LDA), and has two remarkable features when compared with a previously proposed Topics over Time (ToT), which is also an extension of LDA. First, we can avoid overfitting to temporal data, because temporal data are modeled in a Bayesian manner similar to word frequencies. Second, BoT has a conditional probability where no functions requiring time-consuming computations appear. The experiments using newswire documents show that BoT achieves more moderate fitting to temporal data in shorter execution time than ToT.

DOI： 10.1007/978-3-642-00672-2-51

DOI： 10.1007/978-3-642-00672-2_51

researchmap
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices 査読有り

Tomonari Masada, Tsuyoshi Hamada, Yuichiro Shibata, Kiyoshi Oguri

NEXT-GENERATION APPLIED INTELLIGENCE, PROCEEDINGS5579 491 - 500 2009年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER-VERLAG BERLIN

In this paper, we propose an acceleration of collapsed variational Bayesian (CVB) inference for latent Dirichlet allocation (LDA) by using Nvidia CUDA compatible devices. While LIDA is an efficient Bayesian multi-topic document model, it requires complicated computations for parameter estimation in comparison with other simpler document models, e.g. probabilistic latent semantic indexing, etc. Therefore, we accelerate CVB inference, an efficient deterministic inference method for LIDA, with Nvidia CUDA. In the evaluation experiments, we used a set of 50,000 documents and a set of 10,000 images. We could obtain inference results comparable to sequential CVB inference.

DOI： 10.1007/978-3-642-02568-6_50

researchmap
Dynamic hyperparameter optimization for bayesian topical trend analysis 査読有り

Tomonari Masada, Daiji Fukagawa, Atsuhiro Takasu, Tsuyoshi Hamada, Yuichiro Shibata, Kiyoshi Oguri

International Conference on Information and Knowledge Management, Proceedings 1831 - 1834 2009年

▶ 詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ACM

This paper presents a new Bayesian topical trend analysis. We regard the parameters of topic Dirichlet priors in latent Dirichlet allocation as a function of document timestamps and optimize the parameters by a gradient-based algorithm. Since our method gives similar hyperparameters to the documents having similar timestamps, topic assignment in collapsed Gibbs sampling is affected by timestamp similarities. We compute TFIDF-based document similarities by using a result of collapsed Gibbs sampling and evaluate our proposal by link detection task of Topic Detection and Tracking. Copyright 2009 ACM.

DOI： 10.1145/1645953.1646242

Scopus

researchmap

その他リンク： http://dblp.uni-trier.de/db/conf/cikm/cikm2009.html#conf/cikm/MasadaFTHSO09
Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation 査読有り

Tomonari Masada, Tsuyoshi Hamada, Yuichiro Shibata, Kiyoshi Oguri

ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS5678 253 - 264 2009年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER-VERLAG BERLIN

This paper provides a new method for multi-topic Bayesian analysis for microarray data. Our method achieves a further maximization of lower bounds in a marginalized variational Bayesian inference (MVB) for Latent Process Decomposition (LPD), which is an effective probabilistic model for microarray data. In our method, hyperparameters in LPD are updated by empirical Bayes point estimation. The experiments based on microarray data of realistically large size show efficiency of our hyperparameter reestimation technique.

DOI： 10.1007/978-3-642-03348-3_26

researchmap
LDA文書モデルによる画像からの多重トピック抽出のGPUを用いた高速化(高精細度画像の処理・表示および一般) 査読有り

正田備也, 濱田剛, 柴田裕一郎, 小栗清

映像情報メディア学会技術報告32 ( 0 ) 1 - 6 2008年

▶ 詳細を見る

記述言語：日本語出版者・発行元：一般社団法人映像情報メディア学会

本論文では,LDA(latent Dirichlet allocation)言語モデルによる画像からの多重トピック抽出を,GPUを用いて高速化する手法を提案する.LDAはテキスト・マイニングのための確率モデルとしてBleiらにより提案されたが,近年,他のマルチメディア情報へも応用されている.そこで,本論文では,Wangの10,000test imagesにLDAを適用し,多重トピック抽出をおこなう.LDAのためのパラメータ推定にはcollapsed変分ベイズ法を用いるが,Nvidia CUDA互換GPUを利用して推定を高速化する手法を提案する.

DOI： 10.11485/itetr.32.54.0_1

CiNii Article

researchmap
GPUを用いたサブペタペタフロップス高性能計算機システム(高精細度画像の処理・表示および一般) 査読有り

濱田剛, 正田備也, 柴田裕一郎, 小栗清

映像情報メディア学会技術報告32 ( 0 ) 17 - 19 2008年

▶ 詳細を見る

記述言語：日本語出版者・発行元：一般社団法人映像情報メディア学会

DOI： 10.11485/itetr.32.54.0_17

CiNii Article

researchmap
Unmixed spectrum clustering for template composition in lung sound classification 査読有り

Tomonari Masada, Senya Kiyasu, Sueharu Miyahara

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS5012 964 - 969 2008年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER-VERLAG BERLIN

In this paper, we propose a method for composing templates of lung sound classification. First, we obtain a sequence of power spectra by FFT for each given lung sound and compute a small number of component spectra by ICA for each of the overlapping sets of tens of consecutive power spectra. Second, we put component spectra obtained from various lung sounds into a single set and conduct clustering a large number of times. When component spectra belong to the same cluster in all clustering results, these spectra show robust similarity. Therefore, we can use such spectra to compose a template of lung sound classification.

DOI： 10.1007/978-3-540-68125-0_100

researchmap
Comparing LDA with pLSI as a dimensionality reduction method in document clustering 査読有り

Tomonari Masada, Senya Kiyasu, Sueharu Miyahara

LARGE-SCALE KNOWLEDGE RESOURCES4938 13 - 26 2008年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：SPRINGER-VERLAG BERLIN

In this paper, we compare latent Dirichlet allocation (LDA) with probabilistic latent semantic indexing (pLSI) as a dimensionality reduction method and investigate their effectiveness in document clustering by using real-world document sets. For clustering of documents, we use a method based on multinomial mixture, which is known as an efficient framework for text mining. Clustering results are evaluated by F-measure, i.e., harmonic mean of precision and recall. We use Japanese and Korean Web articles for evaluation and regard the category assigned to each Web article as the ground truth for the evaluation of clustering results. Our experiment shows that the dimensionality reduction via LDA and pLSI results in document clusters of almost the same quality as those obtained by using original feature vectors. Therefore, we can reduce the vector dimension without degrading cluster quality. Further, both LDA and pLSI are more effective than random projection, the baseline method in our experiment. However, our experiment provides no meaningful difference between LDA and pLSI. This result suggests that LDA does not replace pLSI at least for dimensionality reduction in document clustering.

DOI： 10.1007/978-3-540-78159-2_2

researchmap
P2P情報検索における単語の重みに基づいたデータ分散配置手法（共著）

倉沢央, 若木宏美, 正田備也, 高須淳宏, 安達淳

情報処理学会マルチメディア、分散、協調とモバイルシンポジウム(DICOMO2007) 2007年7月

▶ 詳細を見る

記述言語：日本語掲載種別：研究論文（学術雑誌）

researchmap
混合ディリクレ分布を用いた文書分類の精度について（共著）査読有り

正田備也, 高須淳宏, 安達淳

情報処理学会論文誌：データベース48 ( SIG11(TOD34) ) 14 - 26 2007年6月15日

▶ 詳細を見る

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：情報処理学会

文書分類のための代表的な確率論的手法にナイーヴ・ベイズ分類器がある．しかし，ナイーヴ・ベイズ分類器は，スムージングと併用して初めて満足な分類精度を与える．さらに，スムージング・パラメータは，文書集合の性質に応じて適切に決めなければならない．本論文では，パラメータ・チューニングの必要がなく，また，多様な文書集合に対して十分な分類精度を与える効果的な確率論的枠組みとして，混合ディリクレ分布に注目する．混合ディリクレ分布の応用については，言語処理や画像処理の分野で多く研究がある．特に，言語処理分野の研究では，現実の文書データを用いた実験も行われている．だが，評価は，パープレキシティという純粋に理論的な尺度によることが多い．その一方，テキスト・マイニングや情報検索の分野では，文書分類の評価に，正解ラベルとの照合によって計算される精度を用いることが多い．本論文では，多言語テキスト・マイニングへの応用を視野に入れて，英語の20 newsgroupsデータ・セット，および，韓国語のWebニュース文書を用いて文書分類の評価実験を行い，混合ディリクレ分布に基づく分類器とナイーヴ・ベイズ分類器の，定性的・定量的な違いを明らかにする．The naive Bayes classifier is a well-known method for document classification. However, the naive Bayes classifier gives a satisfying classification accuracy only after an appropriate tuning of the smoothing parameter. Moreover, we should find appropriate parameter values separately for different document sets. In this paper, we focus on an effective probabilistic framework for document classification, called Dirichlet mixtures, which requires no parameter tuning and provides satisfying classification accuracies with respect to various document sets. Many researches in the field of image processing and of natural language processing utilize Dirichlet mixtures. Especially, in the field of natural language processing, many experiments are conducted by using real document data sets. However, most researches use the perplexity as an evaluation measure. While the perplexity is a purely theoretical measure, the accuracy is popular for document classification in the field of information retrieval or of text mining. The accuracy is computed by comparing correct labels with predictions made by the classifier. In this paper, we conduct an evaluation experiment by using 20 newsgroups data set and the Korean Web newspaper articles under the intention that we will use Dirichlet mixtures for multilingual applications. In the experiment, we compare the naive Bayes classifier with the classifier based on Dirichlet mixtures and clarify their qualitative and quantitative differences.

CiNii Article

researchmap

その他リンク： http://hdl.handle.net/10069/16317
P2P情報検索における索引とファイルの分散配置手法

倉沢央, 正田備也, 高須淳宏, 安達淳

情報処理学会研究報告2007 ( 36 ) 147 - 154 2007年4月5日

▶ 詳細を見る

記述言語：日本語掲載種別：研究論文（学術雑誌）

researchmap
トピック指向単語クラスタリングを用いた複数トピックの包括的提示による検索支援

若木裕美, 正田備也, 高須淳宏, 安達淳

電子情報通信学会第18回データ工学ワークショップ (DEWS 2007) 2007年3月

▶ 詳細を見る

記述言語：日本語掲載種別：研究論文（学術雑誌）

researchmap
Detection of abnormal lung sounds through investigation of breathing cycle 査読有り

Senya Kiyasu, Kohsuke Yanagihara, Tomonari Masada, Sueharu Miyahara, Mikio Oka

Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers61 ( 12 ) 1769 - 1773 2007年

▶ 詳細を見る

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：Inst. of Image Information and Television Engineers

The purpose of our research is to develop a method for recognizing abnormal lung sounds without the need of a medical specialist. Listening to the sounds of the human body is one of the most important methods of checking someone's health. However, identification of abnormal lung sounds is difficult for an untrained person. We differentiated true abnormal sounds from interfering noise by investigating the fact that lung sounds are generated periodically in relation to the breathing cycle.

DOI： 10.3169/itej.61.1769

Scopus

researchmap
Using a Knowledge Base to Disambiguate Personal Name in Web Search Results 査読有り

Quang Minh Vu, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

APPLIED COMPUTING 2007, VOL 1 AND 2 839 - + 2007年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ASSOC COMPUTING MACHINERY

Results of queries by personal names often contain documents related to several people because of the namesake problem. In order to differentiate documents related to different people, an effective method is needed to measure document similarities and to find documents related to the same person. Some previous researchers have used the vector space model or have tried to extract common named entities for measuring similarities. We propose a new method that uses Web directories as a knowledge base to find shared contexts in document pairs and uses the measurement of shared contexts to determine similarities between document pairs. Experimental results show that our proposed method outperforms the vector space model method and the named entity recognition method.

DOI： 10.1145/1244002.1244188

researchmap

その他リンク： http://dblp.uni-trier.de/db/conf/sac/sac2007.html#conf/sac/VuMTA07
Disambiguation of people in web search using a knowledge base 査読有り

Quang Minh Vu, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

2007 IEEE International Conference on Research, Innovation and Vision for the Future, RIVF 2007 185 - 191 2007年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

Results of queries by personal names often contain documents related to several people because of the namesake problem. In order to differentiate documents related to different people, an effective method is needed to measure document similarities and to find documents related to the same person. Some previous researchers have used the vector space model or have tried to extract common named entities for measuring similarities. We propose a new method that uses Web directories as a knowledge base to find shared contexts in document pairs and uses the measurement of shared contexts to determine similarities between document pairs. Experimental results show that our proposed method outperforms the vector space model method and the named entity recognition method. © 2007 IEEE.

DOI： 10.1109/RIVF.2007.369155

Scopus

researchmap
Query Refinement based on Topical Term Clustering. 査読有り

Hiromi Wakaki, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

Computer-Assisted Information Retrieval (Recherche d'Information et ses Applications) - RIAO 2007, 8th International Conference, Carnegie Mellon University, Pittsburgh, PA, USA, May 30 - June 1, 2007. Proceedings, CD-ROM 2007年

▶ 詳細を見る

出版者・発行元：CID

researchmap
Citation data clustering for author name disambiguation. 査読有り

Tomonari Masada, Atsuhiro Takasu, Jun Adachi

Proceedings of the 2nf International Conference on Scalable Information Systems, Infoscale 2007, Suzhou, China, June 6-8, 2007 62 2007年

▶ 詳細を見る

出版者・発行元：ACM

DOI： 10.4108/infoscale.2007.203

researchmap
Using web directories for similarity measurement in personal name disambiguation 査読有り

Quang Minh Vu, Atsuhiro Takasu, Tomonari Masada, Jun Adachi

Proceedings - 21st International Conference on Advanced Information Networking and Applications Workshops/Symposia, AINAW'072 379 - 384 2007年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE Computer Society

In this paper, we target on the problem of personal name disambiguation in search results returned by personal name queries. Usually, a personal name refers to several people. Therefore, when a search engine returns a set of documents containing that name, they are often relevant to several individuals with the same namesake. Automatic differentiation of people in the resulting documents may help users to search for the person of interest easier. We propose a method that uses web directories to improve the similarity measurement in personal name disambiguation. We carried out experiments on real web documents in which we compared our method with the vector space model method and the named entity recognition method. The results show that our method has advantages over these previous methods. © 2007 IEEE.

DOI： 10.1109/AINAW.2007.367

Scopus

researchmap
具体性指向単語クラスタリングによる網羅的トピックの発見と検索質問拡張支援

若木裕美, 正田備也, 高須淳宏, 安達淳

電子情報通信学会第17回データ工学ワークショップ (DEWS 2006), 2C-i4 2006年3月

▶ 詳細を見る

記述言語：日本語掲載種別：研究論文（学術雑誌）

researchmap
A new measure for query disambiguation using term co-occurrences 査読有り

Hiromi Wakaki, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2006, PROCEEDINGS4224 904 - 911 2006年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：SPRINGER-VERLAG BERLIN

This paper explores techniques that discover terms to replace given query terms from a selected subset of documents. The Internet allows access to large numbers of documents archived in digital format. However, no user can be an expert in every field, and they trouble finding the documents that suit their purposes experts when they cannot formulate queries that narrow the search to the context they have in mind. Accordingly, we propose a method for extracting terms from searched documents to replace user-provided query terms. Our results show that our method is successful in discovering terms that can be used to narrow the search.

DOI： 10.1007/11875581_108

researchmap
Link-Based Clustering for Finding Subrelevant Web Pages 査読有り

Tomonari Masada, Atsuhiro Takasu, Jun Adach i

Proc. International Workshop on Web Document Analysis, 2005 (WDA2005) 2005年9月

▶ 詳細を見る

記述言語：英語

researchmap
検索語の曖昧性を解消するキーワードの提示手法

若木裕美, 正田備也, 高須淳宏, 安達淳

情報処理学会研究報告「データベースシステム」137 ( 137 ) 269 - 276 2005年7月

▶ 詳細を見る

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

CiNii Article

researchmap
共著関係に基づくグラフを用いた書誌情報における著者同定手法の提案と評価

鈴木康平, 正田備也, 高須淳宏, 安達淳

情報処理学会研究報告「データベースシステム」(夏のデータベースワークショップDBWS2005), 2005. ( 137 ) 2005年7月

▶ 詳細を見る

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

researchmap
Improving Web Search by Query Expansion with a Small Number of Terms. 査読有り

Tomonari Masada, Teruhito Kanazawa, Atsuhiro Takasu, Jun Adachi

Proceedings of the Fifth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access, NTCIR-5, National Center of Sciences, Tokyo, Japan, December 6-9, 2005 2005年

▶ 詳細を見る

出版者・発行元：National Institute of Informatics (NII)

researchmap

その他リンク： http://dblp.uni-trier.de/db/conf/ntcir/ntcir2005.html#conf/ntcir/MasadaKTA05
Decomposing the Web graph into parameterized connected components 査読有り

T Masada, A Takasu, J Adachi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMSE87D ( 2 ) 380 - 388 2004年2月

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

We propose a novel method for Web page grouping based only on hyperlink information. Because of the explosive growth of the World Wide Web, page grouping is expected to provide a general grasp of the Web for effective information search and netsurfing. The Web can be regarded as a gigantic digraph where pages are vertices and links are arcs. Our grouping method is a generalization of decomposition into strongly connected components, in which each group is constructed as a subset of a strongly connected component. Moreover, group sizes can be controlled by adjusting a parameter, called the threshold parameter. We call the resulting groups parameterized connected components (PCCs). The algorithm is simple and admits parallelization. Notably, we apply Dijkstra's shortest path algorithm in our grouping method. This paper also includes experimental results for 15 million Web pages, which show the contribution of our method to efficient Web surfer navigation.

researchmap

その他リンク： http://dblp.uni-trier.de/db/journals/ieicet/ieicet87d.html#journals/ieicet/MasadaTA04
R2D2 at NTCIR-4 Web Retrieval Task. 査読有り

Teruhito Kanazawa, Tomonari Masada, Atsuhiro Takasu, Jun Adachi

Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies Information Retrieval, Question Answering and Summarization, NTCIR-4, National Center of Sciences, Tokyo, Japan, June 2-4, 2004 2004年

▶ 詳細を見る

出版者・発行元：National Institute of Informatics (NII)

researchmap

その他リンク： http://dblp.uni-trier.de/db/conf/ntcir/ntcir2004.html#conf/ntcir/KanazawaMTA04
Web page grouping based on parameterized connectivity 査読有り

T Masada, A Takasu, J Adachi

DATABASE SYSTEMS FOR ADVANCED APPLICATIONS2973 374 - 380 2004年

▶ 詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：SPRINGER-VERLAG BERLIN

We propose a novel method for Web page grouping based only on hyperlink information. Because of the explosive growth of the Web, page grouping is expected to provide a general grasp of the Web for effective Web search and netsurfing. The Web can be regarded as a gigantic digraph where pages are vertices and links are arcs. Our method is a generalization of the decomposition into strongly connected components. Each group is constructed as a subset of a strongly connected component. Moreover, group sizes can be controlled by a parameter, called the threshold parameter. We call the resulting groups parameterized connected components. The algorithm is simple and admits parallelization. Notably, we apply Dijkstra's shortest path algorithm in our method.

DOI： 10.1007/978-3-540-24571-1_34

researchmap
パラメータ化された連結成分分解を用いたWeb情報の有効利用

正田備也, 高須淳宏, 安達淳

情報処理学会研究報告「データベースシステム」 (夏のデータベースワークショップDBWS2003), 2003. ( 131(71 ) 2003年7月22日

▶ 詳細を見る

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

researchmap
パラメータ化された連結成分分解によるWebページのグループ化

正田備也, 高須淳宏, 安達淳

情報処理学会データベースシステム研究会、情処研報2002 ( 67, DB ) 297 - 304 2002年7月

▶ 詳細を見る

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

researchmap
A Package for Triangulations. 査読有り

Tsuyoshi Ono, Yoshiaki Kyoda, Tomonari Masada, Kazuyoshi Hayase, Tetsuo Shibuya, Motoki Nakade, Mary Inaba, Hiroshi Imai, Keiko Imai, David Avis

Proceedings of the Twelfth Annual Symposium on Computational Geometry, Philadelphia, PA, USA, May 24-26, 1996 V-17-V-18 - 17 1996年

▶ 詳細を見る

出版者・発行元：ACM

researchmap

その他リンク： http://dblp.uni-trier.de/db/conf/compgeom/compgeom96.html#conf/compgeom/OnoKMHSNIIIA96
Enumeration of Regular Triangulations. 査読有り

Tomonari Masada, Hiroshi Imai, Keiko Imai

Proceedings of the Twelfth Annual Symposium on Computational Geometry, Philadelphia, PA, USA, May 24-26, 1996 224 - 233 1996年

▶ 詳細を見る

出版者・発行元：ACM

CiNii Article

researchmap

その他リンク： http://dblp.uni-trier.de/db/conf/compgeom/compgeom96.html#conf/compgeom/MasadaII96

▼全件表示

MISC

全共闘世代をテキストマイニング

近藤伸郎, 正田備也

じんもんこん2020論文集 ( 2020 ) 297 - 302 2020年12月5日

▶ 詳細を見る

記述言語：日本語

CiNii Article

researchmap
講義主体授業における学生の参加度向上を目指した学習課題

丹羽量久, 正田備也, 福澤勝彦, 三根眞理子, 山地弘起

長崎大学大学教育イノベーションセンター紀要 = Journal of the Center for Educational Innovation, Nagasaki University5 ( 5 ) 19 - 24 2014年3月

▶ 詳細を見る

記述言語：日本語出版者・発行元：長崎大学大学教育イノベーションセンター

General education reform at Nagasaki University has required new pedagogies that enhance student participation in lecture class. The authors addressed this urgent issue by developing widely applicable methods in an interdisciplinary course titled "Information and Society." The course consisted of four lecture series of ICT application, in which 72 students engaged in learning tasks that were designed to facilitate note-taking of key concepts and generalreflection of the lecture content as well as the assessment of their comprehension level. The main instructor edited students' descriptions to put them onto the course site so that the whole class could share the learning and prepare for feedback sessions. Students also responded to questionnaires that were designed to inquire their prior conceptualizations. Future directions using effective learning tasks in lecture class are discussed.

CiNii Article

researchmap

その他リンク： http://hdl.handle.net/10069/34322
Unsupervised Segmentation of Bibliographic Elements with Latent Permutations

Tomonari Masada

International Journal of Organizational and Collective Intelligence2 ( 2 ) 49 - 62 2011年

▶ 詳細を見る

researchmap
リコンフィギャラブルマシンにおけるDMA転送とデータ配置の自動最適化手法の検討

志田さや香, 土肥慶亮, 柴田裕一郎, 濱田剛, 正田備也, 小栗清

電子情報通信学会論文誌. D, 情報・システム = The IEICE transactions on information and systems (Japanese edition)92 ( 12 ) 2127 - 2136 2009年12月1日

▶ 詳細を見る

記述言語：日本語出版者・発行元：一般社団法人電子情報通信学会

リコンフィギャラブルマシンでアプリケーションを高速に実行させるには,FPGAと直接接続された複数のメモリバンクに適切にデータを配置し,ホストとのDMA転送とFPGA上の処理を効果的にオーバラップさせることが必要となる.しかし,これらの最適化処理は一般にユーザ自身が行う必要があり,リコンフィギャラブルマシンでのプログラミングを困難にする一因となっている.本論文では,ホストプロセッサによる前処理を伴うDMA転送時間の隠ぺい化手法を提案し,整数計画法を用いたデータ配置の最適化手法に組み入れて自動化する手法を示す.評価の結果,本手法によりソフトウェア実行に対して最大1.46倍の性能を達成でき,ホストプロセッサによる前処理のオーバヘッドは約2%程度であることが明らかになった.

CiNii Article

researchmap
PCAにおける圧力の概念を用いた回路増殖法の評価

荒木裕太, 柴田裕一郎, 濱田剛, 正田備也, 小栗清

電子情報通信学会技術研究報告. RECONF, リコンフィギャラブルシステム : IEICE technical report109 ( 320 ) 19 - 24 2009年11月26日

▶ 詳細を見る

記述言語：日本語出版者・発行元：一般社団法人電子情報通信学会

PCAは動的に機能を変更、追加できる自律的再構成機能を持つ布線論理である。しかしながら、PCAの動的再構成機能を最大限利用するためには分散的な領域管理が必要となり、これがPCAの課題の1つとなっている。本稿では我々が既に提案している生物の細胞増殖を真似た回路構成法を紹介するとともに、今回新たに定義したルールのもと回路構成を行った場合との比較を行う。ランダムグラフを用いて評価をとった結果、手順数を最大5分の1程度まで減らすことができた。

CiNii Article

researchmap
FPGAによる電源電圧制御回路の実装及び制御精度の評価

副島政人, 酒見隼也, 柴田裕一郎, 黒川不二雄, 濱田剛, 正田備也, 小栗清

電子情報通信学会技術研究報告. RECONF, リコンフィギャラブルシステム : IEICE technical report109 ( 198 ) 19 - 24 2009年9月10日

▶ 詳細を見る

記述言語：日本語出版者・発行元：一般社団法人電子情報通信学会

省エネルギー化の観点から、より安定で効率的な直流電源電圧への要求はますます高まっている。中でもディジタル制御方式DC-DCコンバータは高い信頼性、柔軟性等を得られることから注目を集めている。本稿ではFPGAを用いたDC-DCコンバータの制御について議論する。このアプローチはDSPなどのソフトウェア処理に比べ高速化が期待できるとともに、さまざまな制御アルゴリズムに柔軟に対応できるメリットを持つ。本稿では、このようなシステムに必要とされる演算精度の評価や、精度とハードウエア量とのトレードオフについて、DC-DCコンバータの試作を通じて実証的に議論する。特性評価の結果、小数部10bit程度の固定小数点演算で十分な制御精度が得られることが分かった。また、位相をずらした複数クロックを用いることで、システムクロックの上昇を抑えつつPWMの分解能を向上させる設計の効果についても評価した。

CiNii Article

researchmap
リコンフィギャラブルシステムにおけるマルチスレッドプログラミングモデルを用いたメモリアクセス最適化手法の一検討

土肥慶亮, 志田さや香, 柴田裕一郎, 濱田剛, 正田備也, 小栗清

電子情報通信学会技術研究報告. RECONF, リコンフィギャラブルシステム : IEICE technical report109 ( 26 ) 61 - 66 2009年5月7日

▶ 詳細を見る

記述言語：英語出版者・発行元：一般社団法人電子情報通信学会

リコンフィギャラブルシステムは既存のマイクロプロセッサアーキテクチャと比べて高い性能とコストパフォーマンスを達成できるアプリケーションが存在することが知られている.しかし,システムのもつ性能を最大限に発揮するにはアーキテクチャに適した設計や記述を行うなど,専門的な知識を必要とする場合が多い.本稿では,マルチスレッドプログラミングモデルを取り入れた開発環境を構築することでリコンフィギャラブルシステムでのユーザ支援が目的である.実験の結果、実装したトランスレータはメモリアクセス最適化に関するDMA転送やシフトレジスタの自動生成など,パフォーマンス向上に効果的なコードの自動生成が可能な事がわかった.

CiNii Article

researchmap
GPUを用いたサブペタペタフロップス高性能計算機システム

濱田剛, 正田備也, 柴田裕一郎, 小栗清

電子情報通信学会技術研究報告. IE, 画像工学108 ( 324 ) 17 - 19 2008年11月21日

▶ 詳細を見る

記述言語：日本語出版者・発行元：一般社団法人電子情報通信学会

CiNii Article

researchmap
LDA文書モデルによる画像からの多重トピック抽出のGPUを用いた高速化

正田備也, 濱田剛, 柴田裕一郎, 小栗清

電子情報通信学会技術研究報告. IE, 画像工学108 ( 324 ) 1 - 6 2008年11月21日

▶ 詳細を見る

記述言語：日本語出版者・発行元：一般社団法人電子情報通信学会

本論文では,LDA (latent Dirichlet allocation)言語モデルによる画像からの多重トピック抽出を,GPUを用いて高速化する手法を提案する.LDAはテキスト・マイニングのための確率モデルとしてBleiらにより提案されたが,近年,他のマルチメディア情報へも応用されている.そこで,本論文では,Wangの10,000 test imagesにLDAを適用し,多重トピック抽出をおこなう.LDAのためのパラメータ推定にはcollapsed変分ベイズ法を用いるが,Nvidia CUDA互換GPUを利用して推定を高速化する手法を提案する.

CiNii Article

researchmap
文書クラスタリングのための潜在的ディリクレ配分法による次元圧縮

正田備也, 喜安千弥, 宮原末治

情報処理学会研究報告データベースシステム（DBS）2007 ( 65 ) 381 - 386 2007年7月3日

▶ 詳細を見る

記述言語：日本語出版者・発行元：一般社団法人情報処理学会

本論文では、Blei らによって提案された潜在的ディリクレ配分法(latent Dirichlet allocation)を、特徴ベクトルの次元圧縮法として利用し、文書クラスタリングにおける有効性を明らかにする。評価実験では、曰本語と韓国語の Web ニュース記事のクラスタリングをおこない、記事の属するジャンルをクラスタリング結果の評価に用いる。単語の出現頻度をそのまま入力として、混合多項分布モデルを用いたクラスタリングを行う場合と、潜在的ディリクレ配分法によって次元圧縮された特徴ベクトルを入力として、同じく混合多項分布モデルを用いたクラスタリングを行う場合とで、クラスタリング結果を比較評価する。In this paper, we employ the latent Dirichlet allocation as a method for the dimensionality reduction of feature vectors and reveal its effectiveness in document clustering. In the evaluation experiment, we perform clustering on the document sets of Japanese and Korean Web news articles. We regard the categories assigned to each article as the ground truth of clustering evaluation. We compare the clustering results obtained by using the feature vectors whose entries are term frequencies with the results obtained by using the feature vectors whose dimensions are reduced by the latent Dirichlet allocation.

CiNii Article

researchmap

その他リンク： http://id.nii.ac.jp/1001/00018810/
文書クラスタリングのための潜在的ディリクレ配分法による次元圧縮

正田備也, 喜安千弥, 宮原末治

電子情報通信学会技術研究報告. DE, データ工学107 ( 131 ) 381 - 386 2007年7月2日

▶ 詳細を見る

記述言語：日本語出版者・発行元：一般社団法人電子情報通信学会

本論文では,Bleiらによって提案された潜在的ディリクレ配分法(latent Dirichlet allocation)を,特徴ベクトルの次元圧縮法として利用し,文書クラスタリングにおける有効性を明らかにする.評価実験では,日本語と韓国語のWebニュース記事のクラスタリングをおこない,記事の属するジャンルをクラスタリング結果の評価に用いる.単語の出現頻度をそのまま入力として,混合多項分布モデルを用いたクラスタリングを行う場合と,潜在的ディリクレ配分法によって次元圧縮された特徴ベクトルを入力として,同じく混合多項分布モデルを用いたクラスタリングを行う場合とで,クラスタリング結果を比較評価する.

CiNii Article

researchmap
Personal Name Disambiguation in Web Search Using Knowledge Base (jointly worked)

Quang Minh VU, Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

DBSJ Letters5 ( 4 ) 53 - 56 2007年

▶ 詳細を見る

researchmap
知識ベースを用いた人名検索時の曖昧性の解消

VuQuangMinh, 正田備也, 高須淳宏, 安達淳

情報処理学会研究報告データベースシステム（DBS）2006 ( 78 ) 185 - 192 2006年7月13日

▶ 詳細を見る

記述言語：英語出版者・発行元：一般社団法人情報処理学会

人名で検索するとき，同姓同名のため，検索結果に複数の人に関する文書が含まれることが通例である．検索結果をそれぞれの人に関する文書クラスタに分ける手法について検討した．文書間の類似度を計り，同じ人に関する文書かどうかを推測する必要があるが，先行研究では，ベクトル空間モデル法や固有名詞抽出法に基づいて文書間の類似度を計っている．我々は知識ベースを用いて，文書間の共通コンテキストを見つけて，共通コンテキストの重みを計り，文書間の類似度を測定する手法を提案する．実験により，我々の提案手法が先行手法より優れていると確認された．Results of queries by personal names often contain documents related to several people because of namesake problem. In order to discriminate documents related to different people, it is required an effective method to measure document similarities and to find out relevant documents of the same person. Some previous researches have used cosine similarity method or have tried to extract common named entities for measuring similarities. We propose a new method which uses web directories as knowledge base to find out shared contexts in document pairs and uses the measurement of shared contexts as similarities between document pairs. Experimental results show that our proposed method outperforms cosine similarity method and common named entities method.

CiNii Article

researchmap

その他リンク： http://id.nii.ac.jp/1001/00018907/
知識ベースを用いた人名検索時の曖昧性の解消

ヴークァンミン, 正田備也, 高須淳宏, 安達淳

電子情報通信学会技術研究報告. DE, データ工学106 ( 149 ) 143 - 148 2006年7月6日

▶ 詳細を見る

記述言語：英語出版者・発行元：一般社団法人電子情報通信学会

人名で検索するとき,同姓同名のため,検索結果に複数の人に関する文書が含まれることが通例である.検索結果をそれぞれの人に関する文書クラスタに分ける手法について検討した.文書間の類似度を計り,同じ人に関する文書かどうかを推測する必要があるが,先行研究では,ベクトル空間モデル法や固有名詞抽出法に基づいて文書間の類似度を計っている.我々は知識ベースを用いて,文書間の共通コンテキストを見つけて,共通コンテキストの重みを計り,文書間の類似度を測定する手法を提案する.実験により,我々の提案手法が先行手法より優れていると確認された.

CiNii Article

researchmap
検索語の曖昧性解消のためのトピック指向単語抽出および単語クラスタリング

若木裕美, 正田備也, 高須淳宏, 安達淳

情報処理学会論文誌（トランザクション）データベース47 ( SIG19 ) 72 - 85 2006年

▶ 詳細を見る

researchmap
Topic-oriented Term Extraction and Term Clustering for Query Focusing

Hiromi WAKAKI, Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

IPSJ Transactions on Databases47 ( SIG19 ) 72 - 85 2006年

▶ 詳細を見る

researchmap
検索語の暖昧性を解消するキーワードの提示手法

若木裕美, 正田備也, 高須淳宏, 安達淳

電子情報通信学会技術研究報告. DE, データ工学105 ( 172 ) 1 - 6 2005年7月7日

▶ 詳細を見る

記述言語：日本語出版者・発行元：一般社団法人電子情報通信学会

既存の検索エンジンでは, キーワードを用いた検索が主流であり, それゆえ数語による適切な検索質問の組み合わせを見つけるのが容易ではない.本稿では, 単語の低頻度共起を利用して, 検索質問の曖昧性を解消する手法を提案する.これは, "いろいろな種類の単語と共起する単語は, 独立したトピックを持つことができない"という仮説に基づき, 単語に重みを与えることである.本手法によって見出された単語を, 元の検索質問に追加すると, 平均適合率の上昇に大変効果があることを示す.また, 他の手法に基づく上位の語に比べると, より細かく特定の内容を示すグループに分けるように作用する語である.検索質問に合致するような単語を見出すことを可能とする.

CiNii Article

researchmap
リンク情報の利用によるWeb検索性能の改善

正田備也, 高須淳宏, 安達淳

情報処理学会論文誌（トランザクション）データベース46 ( SIG8 ) 48 - 59 2005年

▶ 詳細を見る

researchmap
Improving Web Search Performance with Hyperlink Information

Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

IPSJ Transactions on Databases46 ( SIG8 ) 48 - 59 2005年

▶ 詳細を見る

researchmap
Decomposing the Web Graph into Parameterized Connected Components

MASADA Tomonari, TAKASU Atsuhiro, ADACHI Jun

IEICE transactions on information and systems87 ( 2 ) 380 - 388 2004年2月1日

▶ 詳細を見る

記述言語：英語出版者・発行元：一般社団法人電子情報通信学会

We propose a novel method for Web page grouping based only on hyperlink information. Because of the explosive growth of the World Wide Web, page grouping is expected to provide a general grasp of the Web for effective information search and netsurfing. The Web can be regarded as a gigantic digraph where pages are vertices and links are arcs. Our grouping method is a generalization of decomposition into strongly connected components, in which each group is constructed as a subset of a strongly connected component. Moreover, group sizes can be controlled by adjusting a parameter, called the threshold parameter. We call the resulting groups parameterized connected components (PCCs). The algorithm is simple and admits parallelization. Notably, we apply Dijkstra's shortest path algorithm in our grouping method. This paper also includes experimental results for 15 million Web pages, which show the contribution of our method to efficient Web surfer navigation.

CiNii Article

researchmap
新しい連結性概念とWeb ページのグループ化への応用

正田備也, 高須淳宏, 安達淳

DBSJ Letters2 ( 1 ) 3 - 6 2003年

▶ 詳細を見る

記述言語：日本語出版者・発行元：日本データベース学会

CiNii Article

researchmap
A New Notion of Connectivity and its Application to Web Page Grouping

Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

2 ( 1 ) 3 - 6 2003年

▶ 詳細を見る

researchmap
Enumerating triangulations in general dimensions

H Imai, T Masada, F Takeuchi, K Imai

INTERNATIONAL JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS12 ( 6 ) 455 - 480 2002年12月

▶ 詳細を見る

記述言語：英語出版者・発行元：WORLD SCIENTIFIC PUBL CO PTE LTD

We propose algorithms to enumerate (1) regular triangulations, (2) spanning regular triangulations, (3) equivalence classes of regular triangulations with respect to symmetry, and (4) all triangulations. All of the algorithms are for arbitrary points in general dimension. They work in output-size sensitive time with memory only of several times the size of a triangulation. For the enumeration of regular triangulations, we use the fact by Gel'fand, Zelevinskii and Kapranov that regular triangulations correspond to the vertices of the secondary polytope. We use reverse search technique by Avis and Fukuda, its extension for enumerating equivalence classes of objects, and a reformulation of a maximal independent set enumeration algorithm. The last approach can be extended for enumeration of dissections.

DOI： 10.1142/S0218195902000980

researchmap
パラメータ化された連結成分分解による Web ページのグループ化

正田備也, 高須淳宏, 安達淳

電子情報通信学会技術研究報告. DE, データ工学102 ( 208 ) 137 - 142 2002年7月11日

▶ 詳細を見る

記述言語：日本語出版者・発行元：一般社団法人電子情報通信学会

WWW上の情報の急速な増大は、テキスト情報のみに基づくWeb検索手法をますます非現実的なものとしている。そこで近年、リンク情報に基づく優れた検索手法が多くの研究によって提供されている。本論文は、リンク情報に基づいてWebページをグループ化する手法を提案する。そのねらいは、検索の単位を大きくすることで、テキスト情報に基づく後続の検索処理の負担を軽減することにある。さらに、この手法は、一つの閾値パラメータを調整することで、グループの粒度を制御することを可能にする。本論文は、予備的実験の結果を含む。これによって、提案されたグループ化手法の特徴が明らかにされる。

CiNii Article

researchmap
パラメータ化された連結性に基づくWeb ページのグループ化

正田備也, 高須淳宏, 安達淳

DBSJ Letters1 ( 1 ) 47 - 50 2002年

▶ 詳細を見る

記述言語：日本語出版者・発行元：日本データベース学会

CiNii Article

researchmap
Grouping Web Pages Based on Parameterized Connectivity

Tomonari MASADA, Atsuhiro TAKASU, Jun ADACHI

1 ( 1 ) 47 - 50 2002年

▶ 詳細を見る

researchmap

▼全件表示

講演・口頭発表等

Documents as a Bag of Maximal Substrings: An Unsupervised Feature Extraction for Document Clustering

13th International Conference on Enterprise Information Systems (ICEIS 2011) 2011年

▶ 詳細を見る

researchmap
Semi-supervised Bibliographic Element Segmentation with Latent Permutations

International Conference on Asia-Pacific Digital Libraries (ICADL 2011) 2011年

▶ 詳細を見る

researchmap
Infinite Latent Process Decomposition

IEEE International Conference on Bioinformatics & Biomedicine (BIBM 2010) 2010年

▶ 詳細を見る

会議種別：ポスター発表

researchmap
Unsupervised Segmentation of Bibliographic Elements with Latent Permutations

The 1st International Workshop on Web Intelligent Systems and Services (WISS 2010) 2010年

▶ 詳細を見る

researchmap
시간에 따른 의미 변화 인지를 위한 가중치 구조의 적용

2010 IEEK Summer Conference 2010年

▶ 詳細を見る

researchmap
Modeling Topical Trends over Continuous Time with Priors

the seventh International Symposium on Neural Networks (ISNN 2010) 2010年

▶ 詳細を見る

researchmap
An Adaptive Weighting Scheme for Time-dependent Semantic Change Recognition

2010年

▶ 詳細を見る

researchmap
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA compatible devices

IEA/AIE 2009 2009年

▶ 詳細を見る

researchmap
Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

Proc. of the Joint Conference on Asia-Pacific Web Conference (APWeb) and Web-Age Information Management (WAIM) 2009年

▶ 詳細を見る

researchmap
Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

ADMA 2009 2009年

▶ 詳細を見る

researchmap
Dynamic Hyperparameter Optimization for Bayesian Topical Trend Analysis

CIKM 2009 2009年

▶ 詳細を見る

会議種別：ポスター発表

researchmap
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA compatible devices

2009年

▶ 詳細を見る

researchmap
Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

2009年

▶ 詳細を見る

researchmap
Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

2009年

▶ 詳細を見る

researchmap
Dynamic Hyperparameter Optimization for Bayesian Topical Trend Analysis

2009年

▶ 詳細を見る

会議種別：ポスター発表

researchmap
Character Categorization via Latent Dirichlet Allocation for Kana Sequence Segmentation with Conditional Random Fields

16th International Conference on Computers in Education (ICCE 2008) 2008年

▶ 詳細を見る

会議種別：ポスター発表

researchmap
Unmixed Spectrum Clustering for Template Composition in Lung Sound Classification

Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2008 2008年

▶ 詳細を見る

researchmap
Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering.

3rd International Conference on Large-scale Knowledge Resources 2008年

▶ 詳細を見る

researchmap
Character Categorization via Latent Dirichlet Allocation for Kana Sequence Segmentation with Conditional Random Fields

2008年

▶ 詳細を見る

会議種別：ポスター発表

researchmap
Unmixed Spectrum Clustering for Template Composition in Lung Sound Classification

2008年

▶ 詳細を見る

researchmap
Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering.

2008年

▶ 詳細を見る

researchmap
Clustering Images with Multinomial Mixuture Models.

8th International Symposium on advanced Intelligent Systems (ISIS 2007) 2007年

▶ 詳細を見る

researchmap
書誌情報における著者名の曖昧性解消のためのクラスタリング手法の提案

第18回データ工学ワークショップ 2007年

▶ 詳細を見る

researchmap
Clustering Images with Multinomial Mixuture Models.

2007年

▶ 詳細を見る

researchmap
Link-Based Clustering for Finding Subrelevant Web Pages

Third International Workshop on Web Document Analysis 2005年

▶ 詳細を見る

researchmap
Link-Based Clustering for Finding Subrelevant Web Pages

2005年

▶ 詳細を見る

researchmap
Web Page Grouping Based on Parameterized Connectivity

The 9th International Conference on Database Systems for Advanced Applications 2004年

▶ 詳細を見る

researchmap
Web Page Grouping Based on Parameterized Connectivity

2004年

▶ 詳細を見る

researchmap
パラメータ化された連結性とWebページのグループ化への応用

第2回情報科学技術フォーラム (FIT2003) 2003年

▶ 詳細を見る

researchmap
パラメータ化された連結成分分解を用いたWeb情報の有効利用

夏のデータベース・ワークショップ DBWS2003 2003年

▶ 詳細を見る

researchmap
パラメータ化された連結性に基づくWebページのグループ化

第14回データ工学ワークショップ 2003年

▶ 詳細を見る

researchmap
パラメータ化された連結性に基づくWebページのグループ化

第1回情報科学技術フォーラム 2002年

▶ 詳細を見る

researchmap
パラメータ化された連結成分分解によるWebページのグループ化

夏のデータベースワークショップ DBWS2002 2002年

▶ 詳細を見る

researchmap
Enumeration of Regular Triangulations

12th annual ACM Symposium on Computational Geometry 1996年

▶ 詳細を見る

researchmap
Enumeration of Regular Triangulations

1996年

▶ 詳細を見る

researchmap

▼全件表示

所属学協会

電子情報通信学会

▶ 詳細を見る

researchmap
情報処理学会

▶ 詳細を見る

researchmap

Works（作品等）

部分文字列の出現頻度を文書の特微量として用いたベイズ的トピックモデルに関する研究

2011年

-

2012年

▶ 詳細を見る

researchmap
統計学的ライムを利用した情報ナビゲーション

2010年

-

2012年

▶ 詳細を見る

researchmap
外的知識を利用としたッマルチトピック・モデルによる多様なテキスト情報の連結

2010年

-

2011年

▶ 詳細を見る

researchmap
「情報処理学会論文誌：データベース(TOD)」編集委員

2007年

-

2011年

▶ 詳細を見る

researchmap
テキストの時間情報を利用したマルチトピック・モデルによる文書間・単語間類似度への時間性の導入

2009年

-

2010年

▶ 詳細を見る

researchmap
テキストの時間情報を利用したマルチトピック・モデルによる注目すべき話題群の時間的変遷の分析

2008年

-

2009年

▶ 詳細を見る

researchmap

▼全件表示

共同研究・競争的資金等の研究

コーパスの構成要素としての文書と単語列としての文書を架橋するトピックモデル

日本学術振興会科学研究費助成事業

正田備也

▶ 詳細を見る

2021年4月 - 2024年3月

課題番号：21K12017

配分額：4030000円（直接経費：3100000円、間接経費：930000円）

本研究の目的は、コーパスに特殊的な文書エンコーダとしてのトピックモデルに、汎用性のある分散表現を与える文書エンコーダとしてのトランスフォーマ言語モデルを組み合わせ、トピックモデルによるトピック抽出の質を向上させることであった。
初年度は、トピックモデルの変分推論にトランスフォーマを組み合わせる準備として、多層パーセプトロン(MLP)を用いた簡易的な単語埋め込みを使って変分推論を実現し、perplexityやNPMIで定量評価するところまで達成した。成果は国際会議SIMBig 2021で発表済みである。
初年度の研究の貢献は、研究計画時は想定していなかった問題を解決した点にある。その問題とは、変分オートエンコーダ(VAE)の枠組みをトピックモデルに利用することの著しい難しさである。原因はcomponent collapseであり、過去にも同じ問題に対処しようとしたことがあった。その際は、結局、component collapseを引き起こすKL情報量の効き方を手動で調整するなど、アドホックに対処していた。今回は、Pyroが公式サイトで公開しているProdLDAの実装例も試すなどしたが、やはり、どのデータ集合でも通用する汎用性のある仕方でこの問題に対処はできず、推論のやり方自体を再検討した。
その結果、SIMBig 2021の論文では、LDAの原論文に掲載されている変分下界(ELBO)をそのまま最大化する手法を提案した。つまり、VAEは使っていない。ELBOに現れる事後分布パラメータをMLPでreparameterizeすることによって、分散表現を与える文書エンコーダとトピックモデルとを組み合わせるための、新しい道を開いた。

researchmap
トピックモデルにおけるRNNの利用の有効性に関する研究

日本学術振興会科学研究費助成事業基盤研究(C)

正田備也

▶ 詳細を見る

2018年4月 - 2021年3月

課題番号：18K11440

配分額：4420000円（直接経費：3400000円、間接経費：1020000円）

平成30年度発表の研究実績は以下の4つである。
(1)文書モデルでのAVB(adversarial variational Bayes)の利用（ICDPA2018フルペーパー）: 本研究課題はトピックモデルにおけるRNNの利用をテーマとする。その際、事後分布推定方法として変分ベイズ推定(VB)を使う。VBにおける近似事後分布の設定手法として、深層学習分野では主に変分オートエンコーダ(VAE)が使われる。VAEでは対角正規分布が近似事後分布としてよく用いられ、そのパラメータをELBOを最大化して求める。一方、より柔軟な近似事後分布を設定する手法としてAVB（敵対的変分ベイズ)が2017年にMeschederらによって提案された。これを文書モデリングに使い、柔軟な事後分布近似を実現した。
(2)RNNによる和歌自動生成（ICCS2018ポスター）:約14万件の和歌をRNNに学習させ、和歌を自動生成する手法を提案した。生成された和歌のスコア付けにはトピックモデルを使い、高スコアのものだけ出力する。この研究を通してRNNの訓練に関する経験を蓄積できた。
(3)時間情報を利用したLDAのためのミニバッチ変分ベイズ推定（PRICAI2018ショートペーパー）: LDAのVBに深層学習フレームワークを使うことはそれほどまだ広くおこなわれていない。この研究ではトピック毎の単語確率分布に時間情報を反映させたLDAを、PyTorchのテンソルのブロードキャストを利用して実現した。
(4)トピックモデルでのAVBの利用（ADMA2018ショートペーパー）: この研究は、(1)の継続で、AVBをトピックモデルの変分推定に利用した。これにより、トピックモデルにおいてもAVBを柔軟な事後分布近似のために使えることが分かった。その結果、RNNを使ったトピックモデルへのAVBの適用可能性の感触を得た。

researchmap
半導体製造ラインにおける欠陥発生予測のためのデータ処理及び、分析アプローチの探索

ソニーセミコンダクタマニュファクチャリング（株）

▶ 詳細を見る

2019年4月 - 2020年3月

担当区分：研究代表者資金種別：産学連携による資金

配分額：2730000円（直接経費：2481000円、間接経費：249000円）

researchmap
実験情報の抽出・可視化・推薦のための電子図書館システムの研究

日本学術振興会科学研究費助成事業基盤研究(B)

高須淳宏, 正田備也

▶ 詳細を見る

2015年4月 - 2018年3月

課題番号：15H02789

配分額：15860000円（直接経費：12200000円、間接経費：3660000円）

学術情報からの情報抽出の課題については、平成28年度に引き続きテキスト分析の研究を行った。平成28年度はCRFを用いた情報抽出を行ったが、平成29年度は深層学習を用いた学術文献本文の分析および情報抽出を行い抽出精度の向上をはかった。文献に含まれる実験情報の抽出においては、実験結果をまとめた表の解析に取り組んだ。表は基本的にはn行m列のセルの並びとなるが、複数の行や列にまたがる複合的なセルもある。本研究では、セルの境界を表中のテキストのアライメントに基づいて抽出する方法を考案した。これにより複数行（列）にまたがるセルを含む不定形な表かの情報抽出を可能とした。国際会議ICDARで行われた表理解のコンペティションで作成された評価コーパスを用いて性能評価を行ったところ、考案した手法はコンペティションで最も高い精度を達成した手法と同等の精度を有することを確認した。
情報推薦の研究では、ニューラルモデルを用いた利用者およびアイテムのembedding法について研究を進めた。モデルの学習には一般に大規模な訓練データが必要になるが、システムの利用者から訓練データを収集するのは容易でない。そこで、利用者から比較的容易にデータ収集が可能なシステムの利用ログを併用する方法について検討を進めた。アクセスログ情報をコンテキストとするニューラルネットワークを用いることで情報推薦の精度を高められることを確認した。また、アイテムに関するコンテキストを活用することで、「置き換え可能なアイテム」や「相補的な役割を果たすアイテム」など、アイテム間の詳細な関係を抽出できる可能性があることがわかった。

researchmap
タイニーデータマイニング：基底としての確率分布による大規模データの再構成

日本学術振興会科学研究費助成事業基盤研究(C)

正田備也

▶ 詳細を見る

2014年4月 - 2017年3月

課題番号：26330256

配分額：4810000円（直接経費：3700000円、間接経費：1110000円）

この研究は規模の大きなデータの要約を目指しています。主に扱うのは文字で書かれたデータ、つまりテキストデータです。ニュース記事、学術論文、小説などがこれにあたります。テキストデータも量が多くなってくると、ひとつひとつ人間が目を通すわけにいかなくなります。そこで要約を作ります。この研究が作る要約は単語リストです。例えば「試合、ヒット、ピッチャー、トレード」という単語リストを見ると、私たちはこれが野球というトピックを表していると分かります。このような単語リストを膨大なテキストデータから自動的にいくつも取り出し、文章をひとつひとつ読まなくても何が書いてあるか分かるようにするのが、この研究の目的です。

researchmap
確率的生成モデルの合成による情報アライメントの研究

日本学術振興会科学研究費助成事業基盤研究(B)

高須淳宏, 正田備也, 深川大路

▶ 詳細を見る

2011年4月 - 2015年3月

課題番号：23300040

配分額：19890000円（直接経費：15300000円、間接経費：4590000円）

本研究は潜在トピックモデルを用いた情報の多様な分析法の構築を目的とし、情報に付与された時間や情報間の関連性も考慮した分析モデルを構築した。まず、時間情報とテキスト情報を同時に用いるために、文書にタイプスタンプを付与し、テキストとタイムスタンプを同時に生成するトピックモデルを考案した。さらに論文の引用のように相互にリンクされた文書を生成するトピックモデルに拡張した。本研究の応用として、研究者推薦システムを試作し、多様な情報を活用することにより、共同研究者の推薦精度の向上をはかれることを実験的に示した。

researchmap
統計学的ライムを利用した情報ナビゲーション

日本学術振興会科学研究費助成事業若手研究(B)

正田備也

▶ 詳細を見る

2010年 - 2011年

課題番号：22700150

配分額：4030000円（直接経費：3100000円、間接経費：930000円）

本研究は、「意味的な関連性によるのではない単語の共起関係であっても,統計学的に有意な頻度で生じているならば情報収集の手掛かりとして有用性を持つ」という仮定に基づいている。この、統計学的に有意な頻度で生じる共起を、「統計学的ライム」と呼ぶ。そして、ベイズ的な確率モデルを使い、統計学的に有意な頻度で生じている単語の共起関係を抽出することを目指した。最終的に、論文末尾や研究者のWebサイトに現れる書誌情報を、著者名・論文タイトル・学術雑誌名・発表年など異なる書誌フィールドへと教師無し学習によって自動分割する、新しいLDA(潜在的ディリクレ配分法)タイプのトピック抽出法を提案できた。また、提案のモデルの分割精度を半教師付き学習により改善することに成功した。

researchmap
ハイパースペクトラルデータによる遠隔計測画像のサブピクセル解析アルゴリズム

日本学術振興会科学研究費助成事業基盤研究(C)

喜安千弥, 宮原末治, 正田備也, 堀田政二

▶ 詳細を見る

2005年 - 2007年

課題番号：17560376

配分額：3740000円（直接経費：3500000円、間接経費：240000円）

本研究は,遠隔計測によって得られたハイパースペクトラル画像を対象として,サブピクセルの情報を高精度に抽出する手法の開発を目的としている。複数のカテゴリーが画素内に混在することに起因する誤分類を低減し,さらに,画素内のカテゴリーの混在比を高精度に推定することをめざして研究を行った。まず,周辺画素を含めたコンテクストを考慮する分類手法として,分光情報を用いて混合画素か否かを判断し,空間的なコンテクストを考慮して可能性のあるカテゴリーを限定したうえで,妥当なカテゴリーに分類する方法を開発した。また,対象の分光特性が画像内で変動する場合においても画素内の混在比を高精度に推定するため,画像全体を格子状の小領域に分割し,小領域内では分光特性を一定とみなして観測データ自身から要素数および要素スペクトルを推定し,それを用いて画素内の混在比を算出するアルゴリズムを開発した.一方,単一カテゴリーからなる画素におけるスペクトルのばらつきが無視できない場合に有効なアルゴリズムとしてあらかじめ少数のトレーニングデータを与え,それを用いて単一のカテゴリーからなる画素を画像全体から選択し,解析したい混合画素ごとにその周辺にある単一カテゴリー画素を取り出して要素スペクトルを推定し,混合画素内の混在比を推定する方法を開発した.最終年度には,開発したアルゴリズムを半教師付き手法として整理し,利用できる教師データが限定されている場合に画素の分類や画素内の混在比の推定を高精度に行うアルゴリズムとした。

researchmap

▼全件表示

お問い合わせ先
外部の方
 教職員
 学生